-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added roc_auc as a fit_choice #5
Conversation
we should think more about how each sample contributes to roc_auc and see if we can write a "vectorized" function for lexicase. |
I don't have any ideas yet for the typical 1 error per sample, but one way to get a vector of 4 errors would be to report the entire confusion matrix. Something like: I am not sure if that is a good idea... I have only seen lexicase used where there is 1 error per sample. |
i don't think 4 errors would work since it is not enough samples for lexicase to perform well. however, with |
I am not sure what you mean by raw roc values. Would that just be the true positive rate at a particular value for false positive rate? Would that necessarily be 1 per sample? |
yes, i guess it would be the true positive rate as the threshold increases. there is no hard requirement in lexicase that there be 1 case per sample. That's just normally how it's mapped. the important thing is, roughly, that are many cases (more than, say, 15). I'm thinking something like def roc_fit(y_true,y_pred):
fpr, tpr, _ = roc_curve(y_true,y_pred)
return 1-tpr could work for your purposes as a 'vectorized' fitness function. |
In your |
entire arrays. y_pred is the feature output. i should clarify that i think this would make lexicase selection work but i'm not sure it's the best way to formulate the problem. i'm also unclear on how ROC works when you have an arbitrarily scaled floating point vector for y_pred, which could be the case with a program's output in FEW. |
When using a logistic regression classifier I assumed that the model set as the |
oh! no. the feature transformations each get their own fitness to determine which survive. This is a separate step from evaluating the performance of the ML method with which FEW is paired. Currently, that scoring function is specific to the ML. |
So is the |
yes. check out the gecco paper where define & compare different test metrics. i looked into the roc_curve metric in sklearn a bit more, and it seems like you need an estimator with a decision function to get a reasonable result. is that right? |
Tested on a single sample dataset and it seams to work well.
Currently, it is not compatible with any lexicase selection variants because there is no function that returns a vector of roc auc values. I am not sure what such a function would look like, because it is impossible to compute the roc auc of a single prediction.